OCR Performance Prediction using a Bag of Allographs and Support Vector Regression
Identifieur interne : 000074 ( Main/Exploration ); précédent : 000073; suivant : 000075OCR Performance Prediction using a Bag of Allographs and Support Vector Regression
Auteurs : Tapan Bhowmik ; Thierry Paquet [France] ; Nicolas Ragot [France]Source :
English descriptors
- mix :
Abstract
In this paper, we describe a novel and simple technique for prediction of OCR results without using any OCR. The technique uses a bag of allographs to characterize textual components. Then a support vector regression (SVR) technique is used to build a predictor based on the bag of allographs. The performance of the system is evaluated on a corpus of historical documents. The proposed technique produces correct prediction of OCR results on training and test documents within the range of standard deviation of 4.18% and 6.54% respectively. The proposed system has been designed as a tool to assist selection of corpora in libraries and specify the typical performance that can be expected on the selection.
Url:
DOI: 10.1109/DAS.2014.72
Affiliations:
- France
- Centre-Val de Loire, Région Bourgogne, Région Centre
- Rouen, Tours
- Centre Val de Loire Université, Université François-Rabelais de Tours, Université de Rouen
Links toward previous steps (curation, corpus...)
- to stream Hal, to step Corpus: 000088
- to stream Hal, to step Curation: 000088
- to stream Hal, to step Checkpoint: 000035
- to stream Main, to step Merge: 000075
- to stream Main, to step Curation: 000074
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">OCR Performance Prediction using a Bag of Allographs and Support Vector Regression</title>
<author><name sortKey="Bhowmik, Tapan" sort="Bhowmik, Tapan" uniqKey="Bhowmik T" first="Tapan" last="Bhowmik">Tapan Bhowmik</name>
</author>
<author><name sortKey="Paquet, Thierry" sort="Paquet, Thierry" uniqKey="Paquet T" first="Thierry" last="Paquet">Thierry Paquet</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-23832" status="VALID"><orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc><address><addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation><relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-300317" type="direct"><org type="institution" xml:id="struct-300317" status="VALID"><orgName>Université du Havre</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="direct"><org type="institution" xml:id="struct-300318" status="VALID"><orgName>Université de Rouen</orgName>
<desc><address><addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-301288" type="direct"><org type="department" xml:id="struct-301288" status="VALID"><orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect"><org type="institution" xml:id="struct-301232" status="VALID"><orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Bourgogne</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
<author><name sortKey="Ragot, Nicolas" sort="Ragot, Nicolas" uniqKey="Ragot N" first="Nicolas" last="Ragot">Nicolas Ragot</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-204893" status="VALID"><orgName>Laboratoire d'Informatique de l'Université de Tours</orgName>
<orgName type="acronym">LI</orgName>
<desc><address><addrLine>64, Avenue Jean Portalis, 37200 Tours</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.li.univ-tours.fr/</ref>
</desc>
<listRelation><relation active="#struct-300408" type="direct"></relation>
<relation name="EA6300" active="#struct-300298" type="direct"></relation>
</listRelation>
<tutelles><tutelle active="#struct-300408" type="direct"><org type="institution" xml:id="struct-300408" status="VALID"><orgName>Polytech'Tours</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA6300" active="#struct-300298" type="direct"><org type="institution" xml:id="struct-300298" status="VALID"><orgName>Université François Rabelais - Tours</orgName>
<desc><address><addrLine>60 rue du Plat d'Étain, 37020 Tours cedex 1 </addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-tours.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Tours</settlement>
<region type="old region" nuts="2">Région Centre</region>
<region type="region" nuts="2">Centre-Val de Loire</region>
</placeName>
<orgName type="university">Université François-Rabelais de Tours</orgName>
<orgName type="institution" wicri:auto="newGroup">Centre Val de Loire Université</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-01085002</idno>
<idno type="halId">hal-01085002</idno>
<idno type="halUri">https://hal.archives-ouvertes.fr/hal-01085002</idno>
<idno type="url">https://hal.archives-ouvertes.fr/hal-01085002</idno>
<idno type="doi">10.1109/DAS.2014.72</idno>
<date when="2014-04-07">2014-04-07</date>
<idno type="wicri:Area/Hal/Corpus">000088</idno>
<idno type="wicri:Area/Hal/Curation">000088</idno>
<idno type="wicri:Area/Hal/Checkpoint">000035</idno>
<idno type="wicri:Area/Main/Merge">000075</idno>
<idno type="wicri:Area/Main/Curation">000074</idno>
<idno type="wicri:Area/Main/Exploration">000074</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">OCR Performance Prediction using a Bag of Allographs and Support Vector Regression</title>
<author><name sortKey="Bhowmik, Tapan" sort="Bhowmik, Tapan" uniqKey="Bhowmik T" first="Tapan" last="Bhowmik">Tapan Bhowmik</name>
</author>
<author><name sortKey="Paquet, Thierry" sort="Paquet, Thierry" uniqKey="Paquet T" first="Thierry" last="Paquet">Thierry Paquet</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-23832" status="VALID"><orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc><address><addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation><relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-300317" type="direct"><org type="institution" xml:id="struct-300317" status="VALID"><orgName>Université du Havre</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="direct"><org type="institution" xml:id="struct-300318" status="VALID"><orgName>Université de Rouen</orgName>
<desc><address><addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-301288" type="direct"><org type="department" xml:id="struct-301288" status="VALID"><orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect"><org type="institution" xml:id="struct-301232" status="VALID"><orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Bourgogne</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
<author><name sortKey="Ragot, Nicolas" sort="Ragot, Nicolas" uniqKey="Ragot N" first="Nicolas" last="Ragot">Nicolas Ragot</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-204893" status="VALID"><orgName>Laboratoire d'Informatique de l'Université de Tours</orgName>
<orgName type="acronym">LI</orgName>
<desc><address><addrLine>64, Avenue Jean Portalis, 37200 Tours</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.li.univ-tours.fr/</ref>
</desc>
<listRelation><relation active="#struct-300408" type="direct"></relation>
<relation name="EA6300" active="#struct-300298" type="direct"></relation>
</listRelation>
<tutelles><tutelle active="#struct-300408" type="direct"><org type="institution" xml:id="struct-300408" status="VALID"><orgName>Polytech'Tours</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA6300" active="#struct-300298" type="direct"><org type="institution" xml:id="struct-300298" status="VALID"><orgName>Université François Rabelais - Tours</orgName>
<desc><address><addrLine>60 rue du Plat d'Étain, 37020 Tours cedex 1 </addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-tours.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Tours</settlement>
<region type="old region" nuts="2">Région Centre</region>
<region type="region" nuts="2">Centre-Val de Loire</region>
</placeName>
<orgName type="university">Université François-Rabelais de Tours</orgName>
<orgName type="institution" wicri:auto="newGroup">Centre Val de Loire Université</orgName>
</affiliation>
</author>
</analytic>
<idno type="DOI">10.1109/DAS.2014.72</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="mix" xml:lang="en"><term>Bag of Allographs</term>
<term>Historical Documents</term>
<term>OCR Performance Prediction</term>
<term>Support Vector Regression (SVR)</term>
<term>Template Matching</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">In this paper, we describe a novel and simple technique for prediction of OCR results without using any OCR. The technique uses a bag of allographs to characterize textual components. Then a support vector regression (SVR) technique is used to build a predictor based on the bag of allographs. The performance of the system is evaluated on a corpus of historical documents. The proposed technique produces correct prediction of OCR results on training and test documents within the range of standard deviation of 4.18% and 6.54% respectively. The proposed system has been designed as a tool to assist selection of corpora in libraries and specify the typical performance that can be expected on the selection.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
</country>
<region><li>Centre-Val de Loire</li>
<li>Région Bourgogne</li>
<li>Région Centre</li>
</region>
<settlement><li>Rouen</li>
<li>Tours</li>
</settlement>
<orgName><li>Centre Val de Loire Université</li>
<li>Université François-Rabelais de Tours</li>
<li>Université de Rouen</li>
</orgName>
</list>
<tree><noCountry><name sortKey="Bhowmik, Tapan" sort="Bhowmik, Tapan" uniqKey="Bhowmik T" first="Tapan" last="Bhowmik">Tapan Bhowmik</name>
</noCountry>
<country name="France"><region name="Région Bourgogne"><name sortKey="Paquet, Thierry" sort="Paquet, Thierry" uniqKey="Paquet T" first="Thierry" last="Paquet">Thierry Paquet</name>
</region>
<name sortKey="Ragot, Nicolas" sort="Ragot, Nicolas" uniqKey="Ragot N" first="Nicolas" last="Ragot">Nicolas Ragot</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000074 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000074 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Hal:hal-01085002 |texte= OCR Performance Prediction using a Bag of Allographs and Support Vector Regression }}
This area was generated with Dilib version V0.6.32. |